Co-occurrence Graphs Applied to Taxonomy Extraction in Scientific and Technical Corpora
نویسندگان
چکیده
Word co-occurrence graphs have been used in computational linguistics mainly for word sense disambiguation and induction, but until very recently, not for the extraction of hypernymy relations, where the methodology most often applied is the use of lexico-syntactic patterns. In this paper, we show that it is possible to use word co-occurrence statistics to extract IS-A relations between entities in scientific and technical corpora. We exploit the fact that word co-occurrence often has a direction, that is, a term might co-occur with another, but this is very often not true the other way round. This means that one can represent co-occurrence as a directed graph and this graph resembles a taxonomy. In this paper we present an experiment with texts randomly extracted from the Spanish Wikipedia, but our findings suggest that this co-occurrence behavior is a macroscopic and intrinsic property of argumentative discourse in general.
منابع مشابه
Word Co-occurrence Counts Prediction for Bilingual Terminology Extraction from Comparable Corpora
Methods dealing with bilingual lexicon extraction from comparable corpora are often based on word co-occurrence observation and are by essence more effective when using large corpora. In most cases, specialized comparable corpora are of small size, and this particularity has a direct impact on bilingual terminology extraction results. In order to overcome insufficient data coverage and to make ...
متن کاملMetadiscourse Use in Popular and Professional Science: The Case of Hedges and Boosters
The present article shows that all scientific texts included in journals, magazines, and newspapers are vulnerable to the penetration of hedges and boosters. However, it was found that scientific texts in the three corpora tended to open up the possibilities of alternative voices rather than narrowing them down. The relatively higher frequency of occurrence of hedges in comparison with booster...
متن کاملBidirectional Extraction of Phrases for Expanding Queries in Academic Paper Retrieval
This paper proposes a new method for query expansion based on bidirectional extraction of phrases as word n-grams from research paper titles. The proposed method aims to extract information relevant to users’ needs and interests and thus to provide a useful system for technical paper retrieval. The outcome of proposed method are the trigrams as phrases that can be used for query expansion. Firs...
متن کاملBilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
This paper proposes a novel method for lexicon extraction that extracts translation pairs from comparable corpora by using graphbased label propagation. In previous work, it was established that performance drastically decreases when the coverage of a seed lexicon is small. We resolve this problem by utilizing indirect relations with the bilingual seeds together with direct relations, in which ...
متن کاملMapping the Scientific Structure of Iranian Brucellosis Researches Using the Co-authorship and Co-occurrence Network Analysis
Background and Objective: The evaluation of the publishing trend of articles in various scientific fields provides an insight into the efforts of researchers in the field of knowledge. Accordingly, the present study has evaluated and analyzed the scientific publications on brucellosis conducted by Iranian researchers using scientometrics methods and analysis of social networks. Methods: The pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 49 شماره
صفحات -
تاریخ انتشار 2012